Three Case Studies Using Agglomerative Clustering
نویسندگان
چکیده
Finding a data clustering in a data set is a challenging task since algorithms usually depend on the adopted inter-cluster distance as well as the employed definition of cluster diameter. The work described in this paper approaches a well-known agglomerative clustering algorithm named AGNES (Agglomerative Nesting), in regards to its performance on three case studies namely, datasets formed by clusters of different sizes, uneven inter-cluster distances and diameters. Clustering results are evaluated using three well-known indexes, Dunn, Davies-Bouldin and Rand. Results obtained with K-means were used for comparison purposes. The experiments were conducted divided into three case studies. Their results suggest that AGNES and K-means have similar performance as far as identifying clusters with different sizes and inter-cluster distances, however, AGNES obtained the best results when dealing with clusters having both, different sizes and diameters.
منابع مشابه
Exploiting parallelism to support scalable hierarchical clustering
A distributed memory parallel version of the group average Hierarchical Agglomerative Clustering algorithm is proposed to enable scaling the document clustering problem to large collections. Using standard message passing operations reduces interprocess communication while maintaining efficient load balancing. In a series of experiments using a subset of a standard TREC test collection, our par...
متن کاملAgglomerative Hierarchical Clustering using AVL tree in the case of single-linkage clustering method
The hierarchy is often used to infer knowledge from groups of items and relations in varying granularities. Hierarchical clustering algorithms take an input of pairwise data-item similarities and output a hierarchy of the data-items. This paper presents Bidirectional agglomerative hierarchical clustering to create a hierarchy bottom-up, by iteratively merging the closest pair of data-items into...
متن کاملAgglomerative Clustering Using Asymmetric Similarities
Algorithms of agglomerative hierarchical clustering using asymmetric similarity measures are studied. Two different measures between two clusters are proposed, one of which generalizes the average linkage for symmetric similarity measures. Asymmetric dendrogram representation is considered after foregoing studies. It is proved that the proposed linkage methods for asymmetric measures have no re...
متن کاملA Relative Approach to Hierarchical Clustering
This paper presents a new approach to agglomerative hierarchical clustering. Classical hierarchical clustering algorithms are based on metrics which only consider the absolute distance between two clusters, merging the pair of clusters with highest absolute similarity. We propose a relative dissimilarity measure, which considers not only the distance between a pair of clusters, but also how dis...
متن کاملAn Empirical Comparison of Distance Measures for Multivariate Time Series Clustering
Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...
متن کامل